AITopics | user-defined criteria

Collaborating Authors

user-defined criteria

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Assisting humans in complex comparisons: automated information comparison at scale

Yuen, Truman, Watt, Graham A., Lawryshyn, Yuri

arXiv.org Artificial IntelligenceApr-5-2024

Generative Large Language Models enable efficient analytics across knowledge domains, rivalling human experts in information comparisons. However, the applications of LLMs for information comparisons face scalability challenges due to the difficulties in maintaining information across large contexts and overcoming model token limitations. To address these challenges, we developed the novel Abstractive Summarization \& Criteria-driven Comparison Endpoint (ASC$^2$End) system to automate information comparison at scale. Our system employs Semantic Text Similarity comparisons for generating evidence-supported analyses. We utilize proven data-handling strategies such as abstractive summarization and retrieval augmented generation to overcome token limitations and retain relevant information during model inference. Prompts were designed using zero-shot strategies to contextualize information for improved model reasoning. We evaluated abstractive summarization using ROUGE scoring and assessed the generated comparison quality using survey responses. Models evaluated on the ASC$^2$End system show desirable results providing insights on the expected performance of the system. ASC$^2$End is a novel system and tool that enables accurate, automated information comparison at scale across knowledge domains, overcoming limitations in context length and retrieval.

information, information comparison, llm, (16 more...)

arXiv.org Artificial Intelligence

2404.04351

Country:

North America > Canada > Ontario > Toronto (0.28)
North America > United States (0.28)
Europe > Belgium (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre:

Questionnaire & Opinion Survey (1.00)
Research Report > New Finding (0.93)

Industry: Banking & Finance (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

EvalLM: Interactive Evaluation of Large Language Model Prompts on User-Defined Criteria

Kim, Tae Soo, Lee, Yoonjoo, Shin, Jamin, Kim, Young-Ho, Kim, Juho

arXiv.org Artificial IntelligenceSep-24-2023

By simply composing prompts, developers can prototype novel generative applications with Large Language Models (LLMs). To refine prototypes into products, however, developers must iteratively revise prompts by evaluating outputs to diagnose weaknesses. Formative interviews (N=8) revealed that developers invest significant effort in manually evaluating outputs as they assess context-specific and subjective criteria. We present EvalLM, an interactive system for iteratively refining prompts by evaluating multiple outputs on user-defined criteria. By describing criteria in natural language, users can employ the system's LLM-based evaluator to get an overview of where prompts excel or fail, and improve these based on the evaluator's feedback. A comparative study (N=12) showed that EvalLM, when compared to manual evaluation, helped participants compose more diverse criteria, examine twice as many outputs, and reach satisfactory prompts with 59% fewer revisions. Beyond prompts, our work can be extended to augment model evaluation and alignment in specific application contexts.

interactive evaluation, language model prompt, user-defined criteria, (1 more...)

arXiv.org Artificial Intelligence

2309.13633

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback